Static LU Decomposition on Heterogeneous Platforms
نویسندگان
چکیده
In this paper, the authors deal with algorithmic issues on heterogeneous platforms. They concentrate on dense linear algebra kernels, such as matrix multiplication or LU decomposition. Block-cyclic distribution techniques used in ScaLAPACK are no longer sufficient to balance the load among processors running at different speeds. The main result of this paper is to provide a static data distribution scheme that leads to an asymptotically perfect load balancing for LU decomposition, thereby providing solid foundations toward the design of a cluster-oriented version of ScaLAPACK.
منابع مشابه
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two den...
متن کاملA group block distribution strategy for a heterogeneous machine
This paper discusses the data distribution problem for inherently sequential algorithms, such as the LU factorization in linear algebra, when computed on heterogeneous machines. These algorithms present additional difficulties to optimize the processing time due to the fact that the computational load for data matrix columns increases with their index, requiring a fine tuned load assignment and...
متن کاملSpecial issue on parallel matrix algorithms and applications
This issue of the journal contains 11 articles selected from invited and contributed presentations made at the Workshop on Parallel Matrix Algorithms and Applications , which was held in Neuch^ a atel, Switzerland, on August 18–20, 2000. The workshop was well attended with participants from all over Europe and the United States. Papers presented at the workshop covered many aspects of parallel ...
متن کاملParallelization of the LU Decomposition on Heterogeneous Systems
With the appearance of GPUs as valid platforms, not only for graphics computation, but also general-purpose computations, applications that exploit hybrid/heterogeneous systems can be made available to the mass market due to the widespread availability of these systems. Correct distribution of the workload of these applications can lead way to significant performance boosts to complex applicati...
متن کاملPerformance Predictions of Multilevel Communication Optimal LU and QR Factorizations on Hierarchical Platforms
In this paper we study the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We note that we focus on multilevel QR factorization, and give a brief description of the multilevel LU factorization. We first introduce a performance model called Hierarchical Cluster Platform (Hcp), encapsulating the characteristics ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJHPCA
دوره 15 شماره
صفحات -
تاریخ انتشار 2001